Tper Hcaeser Pidi Application of Subspace Gaussian Mixture Models in Contrastive Acoustic Scenarios
نویسندگان
چکیده
This paper describes experimental results of applying Subspace Gaussian Mixture Models (SGMMs) in two completely diverse acoustic scenarios: (a) for Large Vocabulary Continuous Speech Recognition (LVCSR) task over (well-resourced) English meeting data and, (b) for acoustic modeling of underresourced Afrikaans telephone data. In both cases, the performance of SGMM models is compared with a conventional context-dependent HMM/GMM approach exploiting the same kind of information available during the training. LVCSR systems are evaluated on standard NIST Rich Transcription dataset. For under-resourced Afrikaans, SGMM and HMM/GMM acoustic systems are additionally compared to KL-HMM and multilingual Tandem techniques boosted using supplemental out-of-domain data. Experimental results clearly show that the SGMM approach (having considerably less model parameters) outperforms conventional HMM/GMM system in both scenarios and for all examined training conditions. In case of under-resourced scenario, the SGMM trained only using indomain data is superior to other tested approaches boosted by data from other domain.
منابع مشابه
Tper Hcaeser Pidi Implementation of the Standard I-vector System for the Kaldi Speech Recognition Toolkit
This report describes implementation of the standard i-vector-PLDA framework for the Kaldi speech recognition toolkit. The current existing speaker recognition system implementation is based on the Subspace Gaussian Mixture Model (SGMM) technique although it shares many similarities with the standard implementation. In our implementation, we modified the code so that it mimics the standard algo...
متن کاملTper Hcaeser Pidi Implementation of Vtln for Statistical Speech Synthesis
Vocal tract length normalization is an important feature normalization technique that can be used to perform speaker adaptation when very little adaptation data is available. It was shown earlier that VTLN can be applied to statistical speech synthesis and was shown to give additive improvements to CMLLR. This paper presents an EM optimization for estimating more accurate warping factors. The E...
متن کاملTper Hcaeser Pidi Application of Out-of-language Detection to Spoken-term Detection
This paper investigates the detection of English spoken terms in a conversational multi-language scenario. The speech is processed using a large vocabulary continuous speech recognition system. The recognition output is represented in the form of word recognition lattices which are then used to search required terms. Due to the potential multi-lingual speech segments at the input, the spoken te...
متن کاملNoise Compensation for Subspace Gaussian Mixture Models
Joint uncertainty decoding (JUD) is an effective model-based noise compensation technique for conventional Gaussian mixture model (GMM) based speech recognition systems. In this paper, we apply JUD to subspace Gaussian mixture model (SGMM) based acoustic models. The total number of Gaussians in the SGMM acoustic model is usually much larger than for conventional GMMs, which limits the applicati...
متن کاملSpeaker vectors from subspace Gaussian mixture model as complementary features for language identification
In this paper, we explore new high-level features for language identification. The recently introduced Subspace Gaussian Mixture Models (SGMM) provide an elegant and efficient way for GMM acoustic modelling, with mean supervectors represented in a low-dimensional representative subspace. SGMMs also provide an efficient way of speaker adaptation by means of lowdimensional vectors. In our framewo...
متن کامل